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1. Introduction 

Most people reading this article will recognize the sequence 

(1) 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 61, 67, 71, 73, 79, 83, 89, 97 . . . 

of prime numbers. The primes are defined to be the natural or counting numbers 
with exactly two factors, namely 1 and themselves, or equivalently the numbers 
only divisible by themselves and iQ Here are the first 10 primes starting at a 
billion: 

1000000007, 1000000009, 1000000021, 1000000033, 1000000087, 

(2) 

^ ' 1000000093,1000000097,1000000103,1000000123,1000000181. 

You can easily find these primes yourself on your own computer using a mathemat- 
ical package such as Mathematica or Maple. Further, no matter where you start 
looking, even at numbers with hundreds of digits, you will have no trouble find- 
ing primes. Based on this experimental evidence, we can safely and scientifically 
conclude that the sequence of primes never ends. 

Scientific or Empirical Observation. There are infinitely many prime numbers. 



Is this observation a true fact? Certainly it is easy to verify that no matter 
where you look among the numbers you find plenty of primes. From a scientific 
point of view you can do billions of experiments with your mathematical package 
and always find primes. It can be tested more often and more precisely than any 
law of physics. You can safely bet the family farm on this and still sleep soundly at 
night. And yet, I think many of you will agree with me that in this case scientific 
observation and experimental evidence is a sorry excuse for real knowledge. It may 
be acceptable for a court of law or everyday life, but it is totally unacceptable 
given that you can use pure logical reasoning and a few basic axioms for numbers 
to conclude that this is not just an empirical observation, but a fact built into the 
structure of whole numbers themselves. It was the genius of the ancient Greeks to 
develop mathematics not just as an empirical science, but as an axiomatic system 
for logical deductions]^ In Euclid we find in place of the above scientific observation 
the following deduction. 



While one could argue that 1 itself should be a prime, most mathematicians prefer to put 1 
in a class by itself. 

■^The notion of proof may go back much further to the Babylonians. 
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Theorem. There are infinitely many prime numbers. 
Proof. Given primes pi,p2, ■ ■ ■ ,Pn, the number 

(3) N ^{pi-p2-p3---Pn) + l 

must contain a prime factor not among the primes used in its construction. To see 
this, notice pi does not divide N since it leaves a remainder of 1 (or alternatively 
N/pi is clearly not an integer). Similarly the other piS do not divide N. We 
therefore conclude that any finite list of primes is not complete, and therefore there 
must be infinitely many primes. As an example, if we start with the primes 2, 3, 
and 29, we find A^ = 2- 3-29+l = 175 = 5^-7 and thus N contains the two new 
primes 5 and 7. 

Now consider the sequence 

(4) 3, 5, 7, 11, 13, 17, 19, 29, 31, 41, 43, 71, 73, 101, 103, .... 

These are also prime numbers, but only the prime numbers that occur as pairs of 
primes that are two apart. This is the sequence of twin primes. Once again we 
might wonder if twin primes continue to occur or eventually die out. Looking for 
twin primes starting at a billion, we already see one pair in ([2|), and we readily find 

1000000007, 1000000009, 1000000409, 1000000411, 1000000931, 1000000933, 

1000001447, 1000001449, 1000001789, 1000001791, 1000001801, 1000001803. 

There is nothing special about a billion, and you can check that you always find 
twin primes, although for numbers with hundreds of digits they do not occur nearly 
as frequently as the prime numbers. This evidence points strongly to the following 
conclusion. 

Scientific or Empirical Observation. There are infinitely many twin primes. 

This is such a natural observation that it is hard to believe that the Greeks 
did not discover it. Strangely however, the first known published reference to this 
question was made by A. de Polignac in 1849, who conjectured that there will be 
infinitely many prime pairs with any given even difference. Once again, empirically 
one can sleep soundly after betting the farm that this observation is true, but unlike 
for the infinitude of primes, no one has found a string of logical reasoning that 
demonstrates its truth is built into the structure of the integers. Mathematicians 
like challenges, and often give names to challenging unsolved problems. 

Twin Prime Conjecture. There are infinitely many twin primes. 

You are welcome to try to prove this conjecture and become famous, but be 
warned that a great deal of effort has already been expended on this problem. The 
chances that a simple idea such as ([3]) will work is very small. Therefore also put 
some effort into understanding what has been learned about primes in the last two 
hundred years. 

One final word on my argument that empirical evidence provides a solid basis 
for deciding mathematical questions. One can point to many counterexamples for 
this, but I think if you examine these you will usually find they occur either because 
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the observations were not sufficient, or because the phenomena had abrupt changes 
in behavior. The twin prime conjecture could fail if properties of very large num- 
bers, say with more than a million digits, are vastly different than smaller numbers. 
However, since the properties that generate the integers are in play from the start, 
it is against everything we know to believe that all large numbers will behave fun- 
damentally differently than smaller ones. Other famous unsolved problems depend 
on every large number not having an unusual and unlikely property, and in that 
situation one is on much shakier ground 



2. Primes Thin Out 



When people first begin to study prime numbers, they frequently want to find 
a formula for them. Ideally one would plug n into this formula, and the formula 
would produce the n-th prime. While there actually are such formulas (see for 
example [5]), they are so complicated to compute with that you are better off using 
the Sieve of Eratosthenes, a procedure for quickly finding all the primes up to a 
given size. To find all the primes up to N, for example, you first remove all the 
multiples of 2 — the even numbers, from your list of natural numbers up to N, then 
all multiples of 3, then multiples of 5, and so on. After removing all the multiples 
of the initial primes 2, 3,5,7,... P, the numbers left in your list greater than P and 
less than P^ will be exactly the primes in this range, since the primes in this range 
will not have been removed, while any composite number with no prime factors 
< P must neccesarily be larger than P^. If you therefore pick P to be the largest 
prime less than ^/N, you will obtain a list of the primes up to N. For example, to 
find all the primes less than a million, you only need to sieve with the 168 primes 
less than a thousand. Despite this simple procedure for generating the primes, the 
individual primes appear in a highly irregular pattern, as a glance at the list in Q 
reveals. This irregularity makes it unrealistic that the primes are obtainable by a 
simple plug-in formula. 

In view of these considerations, the first step in understanding primes is to think 
of them as a natural phenomena and look for statistical rather than exact data. 
Clearly they get rarer as we move towards larger numbers, and there is a simple 
reason for this. To start with, after 2 all the multiples of 2, the even numbers, 
can not be primes, and therefore we have eliminated one half of all the natural 
numbers. Next, every multiple of 3, (which could be called the "threeven" numbers 
but have no name in English), can not be primes, and this eliminates a third of the 
numbers, however half of these multiples of three are also even, and having already 
been eliminated need to be added back as a correction for this overcounting. Thus 
the proportion of natural numbers not divisible by 2 and 3 is 

1 1 11 
"^^2^3^6^3' 

You can easily check for yourself that the proportion of natural numbers not divis- 
ible by 2, 3, and 5 is by what is called an inclusion-exclusion process 
111111 1 _ 4 

^ ^ ^ 2 " 3 " 5 ^ 2~3 ^ 2~5 ^ 3~5 ^ 2 - 3 • 5 ^ 15' 



■^One such problem is the existence of Landau-Siegel zeros, which we will encounter later in 
this paper. 
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The sum on the left-hand side is actually 



2/ V 37 V 5, 
and now we can see that the above analysis is better understood in terms of simple 
probability. The probability that a number is not even is 1/2 = (1 — 1/2). The 
probability that a number is not divisible by 3 is 2/3= (1 — 1/3), and the probability 
that a number is not divisible by 5 is 4/5 = (1 — 1/5). But these events are 
independent of each other, since knowing only that a number has a prime factor of 
2 tells you nothing about whether it has a prime factor of 3 or 5. Hence the event 
where all these conditions hold is the product of the individual probabilities. Thus 
the probability that a natural number is not divisible by all the primes up to P is 

Clearly this product is never negative, and you might guess 

(9) hm n fl - -) = 0, 

so that the probability a random integer is a prime decreases to zero as the size of 
the integer gets larger^ This was proved by Legendre. You might try to prove this 
yourself, although it is difficult if you haven't see the proof before. (As a first step, 
if you take logarithms of both sides of ([9]) , the problem can be reduced with a little 
calculus to showing that 

1 1 1 1 1 1 

(10) > - = -H 1 \ 1 1 = oo; 

^ ' ^ p 2 3 5 711 

p 

the sum of the reciprocals of the primes diverge. You can find a proof in many 
beginning number theory books, usually in the section on Mertens' theorem. (See 
for example [2], or [8j.) 

There is a standard formulation of the Sieve of Erastothenes involving the Mobius 
function /i defined by 

r 1, ifn = l, 

(11) ^(n) = </ (-1)'', if n = pip2---Pr, 

I 0, if n has a repeated prime factor. 

The product in ([8]) can now be expressed as a sum by 

(12) n(l-^)=E^^ V^2.3.,...P, 

p<P ^ d\V 

where the condition dlV means that we sum over all divisors of V. You can see 
that equation ^ is an example of this formula with = 2 • 3 • 5. We now rewrite 
® as 

(13) hm V ^ = 0. 

d\V 



^ Numerically one does observe this, although the rate that this proportion goes to zero is 
surprisingly slow. For example, sieving by the 25 primes less than 100 leaves 12% of the integers, 
while sieve out the 1229 primes less than 10,000 still leaves 6% of the natural numbers. 
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Figure 1. The graph of it{x) for 1 < a; < 100 



Since this sum will eventually run through all the natural numbers, it would seem 
clear that 



(14) 



^ 0. 



However, this turns out to be very difRcult to deduct, and was only proved in 
1899 by Landau, about a hundred years after Legendre's formula. Equation is 
essentially equivalent to the prime number theorem which we will introduce in the 
next section. The failure of sieve methods to prove results like (HH) led to the dom- 
inance of analytic methods in the study of primes for many years. However, many 
of the recent important advances in the subject have depended on the combination 
of both sieve and analytic methods. 



3. The Prime Number Theorem 

Let us now examine statistically the rate at which the primes thin out. We define 
7r(x) to be the number of primes less than or equal to x. Thus for example 7r(5) = 3 
since 2, 3, and 5 are the three primes less than or equal to 5. You can see from ([T]) 
that 7r(100) = 25. In Figure 1 above is the graph of tx{x) for 1 < a; < 100. Clearly 
7r(x) is a step function with jumps at the primes, and therefore when looked at 
closely is as complicated as the primes. But when you move back and view it over 
a longer range t^(x) becomes extremely regular, as you can see in Figures 2 and 3 
where 7r(x) is graphed over the range 1 < a; < 1000, and 1 < a; < 1, 000, 000. 



°This should not be too surprising for those of you who have studied conditionally convergent 
series in calculus. It is actually the existence of a limit which is hard to prove; it is relatively easy 
to prove that if there is a limit it must be zero. 
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Figure 2. The graph of nix) for 1 < a; < 1000 
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Figure 3. The graph of 7r(a;) for 1 < x < 1, 000, 000 



What should be clear from these graphs is that 7r(a;) must approach a very 
simple function as a; — > oo. This fact is now called the prime number theorem, and 
it asserts that 



(15) 
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Figure 4. The graph of Tr{x) (above) and a;/ log a; (below) for 
1 < a; < 1,000,000 

where ~ means the ratio of the two sides approaches 1 as a; — > cx). Here log is 
the natural logarithn^ to the base e = 2.71828 . . ., and you might wonder why the 
primes know about this transcendental number. The answer is because the integers 
also know about e since 



the harmonic series grows like the natural logarithm. While this is easily proved 
by calculus since the series is well approximated by the corresponding integral, the 
connection with the prime number theorem is much more difficult to establish. 

In Figure 4 we have graphed t:{x) and x/logx together; tt{x) is the upper curve. 
As you can see, this isn't a very good fit, but it suggests (at least in hindsight) 
the correct approximation. The prime number theorem says that on average the 
probability that a number between 1 and x is a prime is 1/loga;, and therefore 
an individual number n should have probability 1/logn of being a prime. This 
density function no longer depends on the large global variable x, and to find the 
total number of primes we should add up the local probabilities or equivalently 
integrate the density function. Therefore it makes sense to approximate Tr{x) with 
the logarithmic integral 



There is no point in comparing the graphs of 7t{x) and li(a;) over the range 
1 < a; < 1,000,000 since the two graphs are so close they appear to be the same 

''The majority of mathematicians use log instead of In to denote the natural logarithm except 
when they are teaching a calculus class. 
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Figure 5. The graph of \i{x) (top), 7r(x), and x/logx (bottom) 
for 1, 000, 000 < a; < 1,100,000 

graph. In Figure 5 we graph tt{x), H(x), and x/\ogx in the range 1,000,000 
< X <1, 100, 000; the top curve is \i{x) and the bottom curve is x/\ogx. 

This evidence for the prime number theorem was first noticed independently by 
Gauss and Legendre near the end of the 18th centurjQ but its proof was only found 
in the closing years of the 19th century, when in 1896 Hadamard and de la Vallee 
Poussin independently proved the prime number theorem. Using integration by 
parts it is easy to see 

(17) 11(2^) = 1 ^ T, ^ + T, ^ + 71 u ^ ' 

log a; [logxY [logxY [iogxp 

and in 1899 de la Vallee Poussin proved that li(a:;) is a better fit to 7r(x) than any 
finite truncation of the series in (flTl) . 



It was also noted empirically that li(a;) is always found to be larger than t:{x), 
which suggests the conjecture that this will always remain true. This however turns 
out to be false, as proved by Littlewood in 1914. This famous result is frequently 
cited as an example of the danger of using empirical data instead of proofs. However, 
I think the graphs above make it rather speculative to guess anything long range 
for this finer order behavior. 



4. The Riemann Hypothesis 



The extraordinarily good fit between ■k{x) and li(a;), far better than the first 
approximation x/\ogx, has been the subject of intensive but largely unsuccessful 
investigation for the last one hundred years. From probability considerations one 

'^The teenage Gauss found the correct approximation li(x), while Legendre found x/Iogx and 
speculated incorrectly on higher order approximations. 
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might expect that the fit should be about (or a httle bigger) than the square root 
of the approximation. This may be observed empirically for primes, and suggests 
that the error in the prime number theorem satisfies the bound, for x sufficiently 
large, 

(18) |7r(a;) - fi(a;)| < Ca;^+% for any e > 0, 

and some constant C . This statement turns out to be equivalent to a conjecture 
Riemann made in 1859 concerning the zeros of the Riemann zeta-function. The 
Riemann zeta-function is defined by 

oc ^ 

(19) c(2:+*^/) = E^' f^"^ ^>i' 

n=l 

where i = ^/—\■ This function can be studied as a function of the complex variable 
z = X + iy. The starting point for the proof of the prime number theorem is the 
Euler product identity 

c(z) ^y± = Y\(i + - + ^ + ^ 

(20) 



--1 P ^ ^ 



n 



1 

1 — 



which you can verify by seeing how the terms in the first product with primes 
2, 3, 5, . . . can be multiplied out to give each natural number exactly once. (The 
series in the first product is a geometric series which converges for a; > 1 to the 
expression in the second product.) 

The Riemann zeta-function can never be zero if a; > 1. The series and product in 
(|20p only converge for a; > 1, but one can find alternative expressions which agree 
with these formulas for a; > 1 but continue to be valid for all z except z = 1 which 
is a singularity of the zeta-function. This extension is the unique differentiable 
extension, which is called the analytic continuation. In the vertical strip in the 
complex plane < a; < 1 the zeta-function is equal to zero at many values of z; 
these places are called "zeta-zeros " of bricfiy "zeros " . The Riemann Hypothesis 
is that all these zeros occur at complex numbers z = 1/2 + iy on the vertical line 
with real part equal to 1/2. While it has been verified that the first 10 trillion zeros 
in this strip above and below the real axis lie on this line, a proof is not in sight. 
The Clay Institute has offered a million dollar prize for a proof of the Riemann 
Hypothesis, and also a million dollar prizes for 6 other "Millennium " problems. 
While the Riemann Hypothesis is decisive in determining the distribution of primes, 
it seems to be of of little help with regard to twin primes. 

5. The Twin Prime Number Theorem 

What about twin primes? One immediately notices that twin primes thin out 
faster than primes. Here we have a famous theorem of Brun from 1919 that in 
contrast to pUj) 

1111111111 
^2^) 5 P = 3 + 5 + 7 + Tl+T3+17+19 + 29 + 31+---<^' 

p a twin prime 
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Figure 6. The graph of tt2{x) for < a; < 100 



the sum of reciprocals of the twin primes converge. 



Let 7r2(a;) be the number of twin prime pairs with the smaUer prime in the pair 
less than or equal to x. Thus for example 7r2(10) — 2 because of the two twin prime 
pairs (3, 5) and (5, 7). In Figures 6, 7 and 8 we graph tt2{x) just as we did before for 
7r(a;). While somewhat more irregular, we once again see an asymptotic behavior 
developing. 

What is the correct asymptotic function? Returning to probability considera- 
tions, what is the probability that n and n + 2 are both prime? The prime number 
theorem is consistent with assigning the probability that a random number n is 
prime to be 1/ logn; this is called the Cramer model, introduced and made use of by 
H. Cramer in 1935 fS^. The chance that n+2 is prime is then 1/ log(n-|-2) ~ 1/ logn. 
Therefore by independence the probability of both being prime is l/(logn)^. This 
suggests that the correct first order approximation for 712(2;) should be a:/(loga;)^, 
or more precisely 



We have graphed these in Figure 9, and as you can see we clearly have the wrong 
answer. 

Although we can not prove anything, there is a heuristically argument which 
suggests the correct answer. This argument can be formulated in various ways, but 

^ The sum converges to 1.92016 . . .. One odd aspect of this type of result is that while we 
do not know if this sum is over a finite or infinite number of twin primes, we can still compute 
its value as precisely as available computer resources allow. It was in computing this constant in 
1994 that Nicely discovered a flaw in the Intel Pentium chip, known but not reported by Intel, 
which created a public furor — "Intel: quality is job 0.999999998", which ended up costing Intel 
hundreds of millions of dollars in recalled chips. 



(22) 
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Figure 8. The graph of tt2{x) for 2 < a; < 1, 000, 000 



here we follow Soundararajan [TH]. One problem with the Cramer's model is that 
it fails to take into account divisibility. Thus, for the primes p > 2, the probability 
that p + 1 is prime is not 1/ log(p + 1) as suggested by the Cramer model but rather 
since p + 1 is even. Further p + 2 is necessarily odd; therefore it is twice as likely to 
be prime as a random number. The conclusion is that n and n + 2 being primes are 
not independent events. Let us now correct for this lack of independence. First, for 
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Figure 9. The graph of 772(2;) (top), li2(a;) (middle), and 
x/{\ogxY for 1 < x < 10^ 



large twin primes, we need both n and n + 2 to not be divisible by 2,3,5,7,11,---. 
The chance that two random numbers are both odd is (l/2)(l/2) = 1/4, but, since 
n being odd forces n + 2 to be odd, the chance that n and n + 2 are both odd is 
1/2, and thus twice as large as random. The chance that two random numbers are 
both not divisible by 3 is (2/3) (2/3) = 4/9, but the chance that n and n + 2 are 
not both divisible by 3 is 1/3 since this occurs if and only if n is congruent to 2 
modulo 3. 

In general, the probability that two random numbers are not divisible by p > 2 
is (1 — 1/p)^, while the probability that both n and n + 2 are not divisible by p is the 
slightly smaller 1 — 2/p since n must miss the two residue classes and —2 modulo 
p. Therefore, the correction factor to the Cramer model for lack of independence 
is 2 if p = 2, and for p > 3 is 




We conclude that the correct approximation for twin primes should be 

„3) .,w..n(-^)p^. 

Given our experience from the prime number theorem, we formulate this as the 
following conjecture. 
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Figure 10. The graph of 772(0;) (bottom) and 1.32032362 li2{x) 
(top) for 1 < x < 1,000,000 

The Twin Prime Number Theorem Conjecture. We have 



The twin prime constant has been computed to many digits, and it is known 
that 



In Figure 10 we see that this conjecture provides what is surely the correct approx- 
imation to 'JT2{x). One might even conjecture that this approximation holds with 
a square root error. The conjecture that the distribution of twin primes satisfy a 
Riemann Hypothesis type error term is well supported empirically, but I think this 
might be a problem that survives the current millennium. 

6. Some recent progress towards the twin prime conjecture. 

One appealing aspect of number theory is that it is hard to predict which prob- 
lems can be solved at our present state of knowledge, and which are currently 
beyond hope of solution. Fermat's Last Theorem, for example, looked hopelessly 

hard until a totally new approach was discovered, and even that approach had 
extremely difficult obstacles. And yet in 10 years the proof was complete. 

In the case of twin primes, there have been several outstanding advances which 
to outward appearances could seem just as formitable as the twin prime conjecture. 



(24) 




(25) 
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To mention just two of these results, it has been proved that for large enough x, 

'''' '*'^^nO-i^)i^^ 

Comparing this with ([23|) or (|24|) we see that there can be at most 4 times as many 
twin primes as conjectured. Secondly, J. Chen proved that there arc infinitely many 
primes p where p + 2 is either a prime or a product of two primes, see for example 

Neither of these results address the question of finding prime numbers which are 
close together. This problem has a long history, and I would like to conclude this 
paper by mentioning some new work of Pintz, Yildirim and me on this topic. The 
question we were initially investigating was not directed at the twin prime conjec- 
ture, but rather the question of finding smaller than average gaps between primes. 
By the prime number theorem the average distance between two consecutive primes 
in the interval [0, x] is 

length of [0, x] x x 

27 r 7 : . rr, i = —TT x ^°SX- 

number of primes m [U, x\ 7r(a;j j^^^-^ 

Thus for example around x — 10^ primes are on average about log(lO^) — 6 log(lO) = 
13.81 . . . apart and this spacing doubles to 27.63 ... at a; = 10^^. We can now ask 
if there are always going to be primes substantially closer than this average as x 
gets larger and larger. To examine this question, consider the sequence, where p„ 
denotes the n-th prime, 

^ P n+l - Pn 

we expect that these values will infinitely often be small. Mathematically we mea- 
sure this by looking for the smallest limit point of the sequence, i.e. the "lim inf ' 
Thus we define 

(28) A := lim inf (" ^"+1"^" 

n^oo \ \0gpn 

and hope to prove A is small. Remarkably, 80 years of work had only found 
that A < .248 . . ., a resuh of Maier lO] from 1988 which utihzed all the previous 
methods applied to the problem. Then last year we finally were able to prove the 
result suggested by the twin prime conjecture. 

Theorem. (Goldston-Pintz-Yildirim) We have A = 0. 

Our method thus produces primes very close together in a statistical sense, but 
what came as a great surprise to us is that if you assume an unproved conjecture 
concerning primes in arithmetic progressions then the method actually produces 
primes that are a bounded distance apart. That one can prove such a strong result 
using this information runs counter to all previous expectations, and for several 
weeks this convinced us that there must be a mistake in our proof. 

To describe this result we begin with a simple example. If you divide the natural 
numbers up modulo 3 you get three residue classes or arithmetic progressions: 



^Erdos proved that this sequence has many limit points in the sense that the set of limit points 
has positive Lebesguc measure. Unfortunately the proof does not tell us the value of any of these 
limit points. 
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n = O(mod 3) 
n = l(niod 3) 
n = 2 (mod 3) 



3,6,9,12,15. 
1,4,7,10,13,. 
2,5,8,11,14,. 



Clearly the only prime in the progression 0(mod3) is 3, but we expect that the 
primes should be equally distributed in the other two progressions, and hence, 
letting 7r(a;;g,a) denotes the number of primes < x in the progression a(modg), 

1 1 -r 

n{x; 3, 1) ~ tt{x; 3, 2) - - n{x) - -- 



2 ' ' 21oga; 

The generalization of this result is called the prime number theorem for arith- 
metic progressions. We need to avoid progressions like O(mod 3) where each term 
is a multiple of some integer > 2; of the q progressions modulo q the number of 
progressions which are not multiples of some number is (t>{q), the Eulcr phi function, 
defined by 

(29) (f){q) := #{a : 1 < a < g and (a, g) = l}, 

where here the notation (a, q) is the gcd of a and q. The prime number theorem 
for arithmetic progressions states that for (a, q) ~ 1, 

(30) n{x;q,a) ^ -^\i{x). 

For applications we need q — q{x) — > cx) as a; ^ c», but unfortunately the best 
result known allows q to only grow at the very slow rate q < (loga;)"^, for any 
However, in applications it is often enough to know that on average over 
many progressions the error here is small, and for this one can take q much larger. 
The main result of this type was proved in 1965 independently by Bombieri and 
Vinogradov, and states that for any A > 1 we have 

liix) 



(log a;)' 



(31) max TT{x;q,a) -^4 <C 

q<Q (a,g) = l 

for Q — x^^'^ /{logx)^ , where B and C are constants which depends on the given 
A. 

The largest power of x which we can take Q to be in the above result is called 
the level of distribution of primes in arithmetic progressions. Thus the Bombieri- 
Vinogradov theorem says the primes have level of distribution i. More precisely, 
we define the level of distribution to be if |3T|) holds for any e > and Q = x''"'^. 
We expect and find numerically that the primes actually have level of distribution 
= 1; this was first conjectured by Elliott and Halberstam. What Pintz, Yildirim 
and I proved is that if the primes have a level of distribution equal to any number 
larger than i, then there must be infinitely often primes a bounded distance apart. 
In the case of level of distribution 1, we proved the following result. 



extension of this result to larger q requires the solution of two famous problems involv- 
ing Dirichlet L-functions, which are a class of functions that include the Riemann zeta-function 
introduced earlier. The first problem is to show that there are no zeros on the real axis, called 
Landau-Siegel zeros, and the second problem is to prove the Riemann Hypothesis for Dirichlet 
L-functions. 
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Theorem. If the EUiott-Halberstam Conjecture is true (actually if > .971), 
then 

(32) Pn+i ^ Pn < 16 for infinitely many n. 

The proof of these results is not that difficult compared to other results in the 
field, but I can only describe the main ideas here. In the first place, we need a 
generalization of twin primes where we consider the tuple or vector 

(33) {n + hi,n + h2, ■ ■ ■ ,n + hk) 

with the shifts hi given hy Ti. — {/ii, h2, . . . , hk}. Letting n — 1, 2, 3, . . . we ask 
how often all of the components of the tuple are simultaneously prime for n < x, 
and denote this number by ■k{x\'H). Thus for example twin primes correspond to 
Ti, — {0,2} with the tuple {n,n + 2). On the other hand the tuples (n, n + 1) is 
only made up of primes when n = 2 since at least one of these numbers is even. 
Similarly (n, n + 2, n + 4) is only made up of primes when n — i since at least one 
of these numbers is divisible by 3. Tuples which do not always have a component 
divisible by some integer > 2 are called admissible^ and for these we expect that 
infinitely often all their components will simultaneously be primes. This is called 
the Hardy-Littlewood prime tuple conjecture. Hardy and Littlewood also made a 
more precise conjecture. Let Vp^H) denote the number of distinct residue classes 
(mod p) the numbers h d Ti. fall into. Just as for the twin prime constant, we can 
correct for the lack of independenc^^ and obtain an expected proportion, called 
the singular series 



(34) 



If ©(Ti) 7^ then Ti, is admissible. Thus H is admissible if and only if VpiH.) < p 
for all p. 

Prime Tuple Conjecture If H is admissible, then 

(35) 7t{x,H) ^ 6{n)likix), where life (x) = / dt. 

The first idea in our method is to try to replace this counting function by an 
approximation for which we can prove asymptotic formulas corresponding to the 
prime tuple conjecture. For this we introduce the von Mangoldt function A(n), 
defined to be logp if n = p"* and zero otherwise. This function tells us whether n 
is a prime or prime power, but since the number of powers is very small they can 
be removed from consideration at a later stage You might try to prove in a few 
lines that the prime number theorem in the form (jisp implies and is easily obtained 
from the formula 



(36) ^(") 



^^This is done in Soundararajan's paper 1131 . 

"'^^You can see easily there are < \/x squares that are < x, and < log2 x different powers < x. 
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In a first course in number theory we prove the elementary formula, which you 
might also try to prove yourself, 

(37) A(n)=^/x(d)log^. 

d\n 

This formula has little utility in applications because it has too many terms. To 
understand this last statement, you can try to prove (|36p by substituting in ([77)) 
and seeing what goes wrong. One can, however, by direct substitution and using 
the prime number theorem obtain formulas like (|36p for the truncated smoothed 
approximation 

(38) An{n) = J2f,{d)\og^ 

d\n 
d<R 

if R is kept somewhat smaller than N. This approximation may seem ad hoc, but it 
arises naturally. Hardy and Littlewood also formulated the prime tuple conjecture 
in terms of the von Mangoldt function by defining 

(39) A{n; U) := K(n + hi)K{n + /la) • ■ • A(n + hu), 

(so this will be zero if any of the A(n + hi) is zero), and then equivalently to (pS)) 
conjectured 

(40) ^ A(n;W) - 6(W)iV. 

n<JV 

In view of ([55)1 and ([5^ . it is natural to approximate A(n;7i) by 

(41) knin- H) := Anin + hi)kR{n + ■ ■ ■ Anin + hk). 

Until 2004, this was the only useful approximation we knew of for this problem. 

We now try to detect primes with these approximations. For this, we need 
an approximation which is never negative, but the approximation Aii{n) and also 
Afi{n,'H) is frequently negative. (For example you can check that A5(30) = — 21og5.) 
Therefore we need to first square the approximation to obtain a non-negative ap- 
proximation. The key formulas we need to compute for our method are 

(42) ^Afl(n;H)2 and ^ A(n + /io)A;i(n; H)^. 

n<N n<N 

While these are complicated to evaluate, the analysis needed is at the level of the 
prime number theorem with remainder, and therefore classical. For the second sum, 
the single factor of A(n + ^o) really is detecting primes (and prime powers) in the 
tuple, since we find that ii ho & H we get a result larger by a factor of log R than 
we get when ha ^ H. The second sum is evaluated by summing A{n + ho) through 
arithmetic progressions modulo products of divisors from Afl(n;7i)^, and this is 
where the level of distribution information is used. The result of this analysis is 
that for R = iV^/'*'^"^ we obtain asymptotic formulas for both sums in (|42p. Using 
these formulas, we can evaluate asymptotically, with r > 1, 



(43) 



2N / k \ 

S-= E^('^ + ^')-^l°g(3Ar) AK(n,H)2 
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Here, if 1 < hi < N, we have A(7i + hi) < \og{n + hi) < logSA^. Thus if wc find 
S > then there must be an n for which at least r + 1 of the A(n + hi) ^ 0, and 
(after removing prime powers) there are r + 1 primes in the tuple Ti.. 



Unfortunately it turns out that S < for the approximation in (|41|) even when 
r = 1, and we fail to prove there are even two primes in a tuple. However we are 
able to recover something by this method, if we switch to the more modest goal of 
finding two primes close together. For this, we now try to find as many primes as 
possible in the interval (n, n + h], which we detect by using all the possible A;-tuples 
that can be formed in that interval. Thus we now consider 

(44) S':^ ( E HN + hi) - rlog{3N)\ ^ Afl(n,H)2 

,i=JV+l \l<hi<h J l<hi.h2,...hk<h 

distinct 

With r = 1 we find this is positive if /i > |logA^, and therefore we conclude 
A < 3/4. One can now improve on this analysis by using approximations not 
just for fc-tuples but a linear combination of all the approximations for 2-tuples, 
3-tuples, and so on up to fc-tuples. This leads to an optimization problem which 
when solved results in A < i. 

The next step is to search for better approximations than ((4T|) . The main dis- 
advantage of this approximation is that it is formed from many short divisor sums 
multiplied together; in ((43|) there are 2k of them, which is what forces the very 
short truncation length R — N'^/^^^'^. This reduces the quality of the approxima- 
tion. It is natural to think that if we could approximate the number of primes in 
a tuple by a single divisor sum then we could take the approximation much longer 
and obtain a better result. As it turns out, sieve methods are based exactly on this 
idea. Instead of the tuple {n + hi^n + h2, . ■ . ,n + h^) consider the polynomial 

(45) V{n,n) = {n + hi){n + h2)...{n + hk) 

If our tuple is a prime tuple then V has k prime factors, and conversely. The 
generalized von Mangoldt function 

(46) Afe(n) = ^Md)(log^)'= 

d\n 

is the arithmetic function commonly used to detect whether numbers have < k 
distinct prime factors. It can be proved that Afe(n) is zero if n has more than 
k distinct prime factors, but is non-zero if n has < k prime factors. Therefore 
AA;(7'(n, Ti)) will be non-zero if the tuple associated with 7i is a prime tuple. In 
view of (|38p it is clear we should approximate this with 

(47) A«(n;7i) = l ^ M^) (log ^ 

■ d\P-H(n) ^ 
d<R 

(Here the factor is a natural normalization.) Notice now that we can approximate 
a prime tuple with a single divisor sum. This is a big step forward, but when this 
approximation is used in the previous analysis one still does not find primes in 
tuples, and one ends up obtaining A < .1339 . . . = 1— \/3/2. This was disappointing 
to Yildirim and me in 2004, but we should have taken to heart the advice: "Never 
give up!" It turns out only one more idea is needed to break through the barrier. 
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and this was discovered by Pintz. The idea is so simple that when I tell it to you I'm 
sure you will not believe something like this is what mathematicians get paid for, 
but I think it usually is the case that good mathematics comes down to common 
sense reasoning. 

The idea is that we have been trying to do too much, when much less would still 
be much more than we need. No one has ever proved there are infinitely many prime 
tuples, so as a start we only need to try to find tuples with SOME primes in them. 
For example, if we have a 1000-tuple which has a total of 1500 prime factors in its 
components, then there must still be at least 500 prime components. Therefore, 
to detect some primes in tuples, you only need to show that P-uin) has less than 
k + £ prime factors, for some £ < k. Thus we should consider approximations with 
Ak+i, where now we have a new variable £ to make use of. Hence, we define our 
new approximation 



This finally succeeds in proving A = using with k large enough and an 
appropriate choice of £. And this also succeeds in proving S > when r = 1 in ([^5]) 
if the level of distribution d > 1/2. Ifi^ = l then we find with r = 1 that S > 
if fc = 7 and £ ^ 1, so that every admissible 7-tuple has two primes in it infinitely 
often under this assumption. While these results are a great advance forward, they 
are still only the camel's nose in the tent, since if r > 2 we fail to show S > even 
if the level of distribution is = 1. 



The history behind the development of the method I have just described is much 
more convoluted than you might guess. The detection method in (|43)) is nearly 
identical to a method Selberg introduced in 1950 for proving n and n+2 will together 
have 5 or fewer prime factors infinitely often (see [12j). This method was generalized 
by Heath-Brown [5] in 1997 to fc-tuples, and this work contains both the detection 
method and the approximation (|48p in the case £ = 1. However the approach was 
never directly applied to primes and was never viewed as a possible lower bound 
method. From 1999 until 2003 Yildirim and I were working on special cases of the 
approximation (|4ip and the formula (j42p . However we saw the problem in terms 
of probability and approximation of moments, and never considered expressions 
like S. In 2003 Yildirim and I thought we had proved Theorem 1 with a new 
approximation somewhat similar to (|47p but more complicated and based partly 
on guesswork. We had no idea at the time of the relevance of the generalized von 
Mangoldt function. In examining our proof, Granville and Soundararajan simplified 
the original moment method into the form of and (jH]), and then found a fatal 
mistake: our approximation did not actually have asymptotic formulas in (j42p . 
Returning to the old approximation (|4ip . we were able to use this new detection 
method and complete the proof that A < j, but until mid-2004 we had no evidence 
of any improved approximations, and were ready to believe they did not exist. Also 
in early 2004 Green and Tao were looking for a special type of sieve bound for 
primes and numbers with a few prime factors in tuples, but could not find anything 



(48) 




d|P„(n) 
d<R 



7. A Curious History 
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appropriate in the literature. Granville brought to their attention a manuscript of 
our work, and they were able to use the asymptotic formula for the first sum in (I42p 
in their celebrated work on finding arbitrarily long strings of primes in arithmetic 
progressions The feature they needed in this formula is that each component 
of the tuple can be shifted individually in the formulas while the other factors 
remain unchanged. In sieve methods one does not use expressions like (|4ip . but 
rather expressions like (|47|1 and (|48p . where changing one component of the tuple 
changes the entire polynomial Vu{n)- Since we were using a moment method, 
the approximation (|4ip naturally arises from multiplying out moments and getting 
products of approximations. However for small gaps between primes, it is the sieve 
approximations which are more effective, but we did not realize this until mid- 
2004. In retrospect, once one has the Granville and Soundararajan formulation of 
the problem, it is a small step to move to the sieve type approximations. Ironically, 
the approximation (j4ip has now been discarded in our final work, but perhaps it 
might still play some future role in the study of twin primes. 

What has been left out of the above account of our work is the frequent contact 
with other mathematicians who freely provided their ideas and suggestions. Many 
times these were decisive for getting back on track and moving the work forward. 
I think in this internet age of quick and easy contact we can take advantage of 
the experience of the worldwide community of mathematicians in our field, always 
being careful however to not waste someone's time. 

Finally, while there is no evidence of this left in the final work, at many stages 
a mathematical package (in my case Mathematica) was the only tool we had for 
testing ideas and experimenting with guesses. Having not grown up with them it 
is always an effort for me to use these programs, but there were many stages of the 
work when I would have quit if not for the information they provided. 



8. Some References for Further Study 

For background information one can in the first place study any beginning num- 
ber theory book. At a somewhat more difhcult level, most elementary results 
mentioned in this paper can be found in Hardy and Wright [8]. For the Riemann 
zeta- function, the classic reference is Titchmarsh [14]. For sieve methods, the recent 
book of Cojocaru and Murty |2] makes for interesting reading, while the classical 
reference is Halberstam and Richert ^Gj. One should also study the long article on 
sieves by Selberg [T^. My favorite research papers related to this area are the 1923 
paper of Hardy and Littlewood 0, the 1965 paper of Bombieri and Davenport l], 
and Selberg's famous 4 page 1947 paper introducing the Selberg sieve. One can 
find the recent work of Goldston-Pintz-Yildirim in a short proof in and a 
very accessible exposition in |13j . 



Up until 2004 it had only been proved that there are arithmetic progression of three primes 
infinitely often. 
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